Violation of a Leggett-Garg inequality with ideal non-invasive measurements 
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The quantum superposition principle states that an entity can exist in two different states simultaneously, counter 
to our 'classical' intuition. Is it possible to understand a given system's behaviour without such a concept? A 
test designed by Leggett and Garg can rule out this possibility. The test, originally intended for macroscopic 
objects, has been implemented in various systems. However to-date no experiment has employed the 'ideal 
negative result' measurements that are required for the most robust test. Here we introduce a general protocol 
for these special measurements using an ancillary system which acts as a local measuring device but which need 
not be perfectly prepared. We report an experimental realisation using spin-bearing phosphorus impurities in 
silicon. The results demonstrate the necessity of a non-classical picture for this class of microscopic system. 
Our procedure can be applied to systems of any size, whether individually controlled or in a spatial ensemble. 



There is a stark contrast between the way we think of the mi- 
croscopic world (which is well described by quantum physics) 
and the way we experience the everyday macroscopic world 
(which appears to follow rules which are altogether more intu- 
itive). There have been a number of proposals for experimen- 
tal tests which pit quantum physics against alternative views 
of reality: for example the theorems of Bell 1 and of Kochen 
and Speckei 2 . Corresponding laboratory tests have been per- 
formed and to-date support the necessity of quantum physics. 
But even if a quantum description of the microscopic world 
is necessary, we face the equally profound question of un- 
derstanding the relationship between the quantum world and 
our familiar classical experience. Some thinkers, such as Pen- 
rose, suggest that there are as-yet undiscovered physical laws 
which prevent superposition of 'macroscopic' states^. Most 
physicists would agree that sufficiently large objects (such as 
the moon) must indeed "be there" when nobody looks. The 
Leggett-Garg inequality^ was developed in order to address 
this question. The protocol may be applied to systems of 
arbitrary size, thus theories which hold that quantum theory 
breaks down at some particular scale can be experimentally 
tested. 

Limited variants of the Leggett and Garg (LG) test have 
been reported for microscopic objects such as photons^ or 
nuclear spins 7 and for the larger superconducting 'transmon' 
system 8 . The approach presented here represents the first im- 
plementation of LG's powerful 'ideal negative result' mea- 
surement procedure. We describe a general protocol for such 
measurements, introducing an ancillary system 9 which acts as 
a local measuring device. Importantly we can account for im- 
perfect preparation of the measuring device through a quantity 
which we call 'venality'. We find that at some finite venality 
(typically corresponding to a thermal threshold) the LG test 



becomes possible. Our procedure can be employed for any 
physical system where a suitable ancilla can be adequately 
initialised; it thus provides a test for a system of any size, 
whether addressed as part of a spatial ensemble or controlled 
individually. 

For a given system with two suitably defined states, our pro- 
tocol provides the opportunity to invalidate the conjunction of 
the following two beliefs: Macrorealism (MR) - the system 
is always in one of its macroscopically distinguishable states; 
and Non-invasive measurability (MM) - it is possible in prin- 
ciple to determine the state of the system without altering its 
subsequent evolution. A quantum physicist will typically re- 
ject NIM, but crucially the test requires only that the macro- 
realist accept In a test of the above assumptions, a 
compelling argument for the non-invasiveness of the measure- 
ments should be made in a language acceptable to a macrore- 
alist. Leggett-Garg inequality violations that have been re- 
ported with weak measurements 5 -™^ employ a measurement 
procedure which may ultimately fail to convince a macrore- 
alist that the measurements are indeed non-invasive. Propos- 
als for experimentally determining the invasiveness of each 
measurement exisf^l, but we make use of Leggett and Garg's 
arguments for the non-invasiveness of an 'ideal negative re- 
sult' measurement scheme. Other experiments have been 
performecPSI which use the assumption of 'stationarity'EK] 
This assumption severely narrows the class of macrorealist 
theories which are put to the test (please see Supplementary 
Methods); we do not make this assumption and so our method 
tests a wider class of theories. 

We employ a method which equips a two level system with 
a local measuring device: another two-level system 9 . We refer 
to the system being tested as the 'primary system' and the as- 
sociated measuring device as the 'ancilla'. We consider how 
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macrorealists might approach an imperfectly prepared mea- 
suring device, showing that even an 'adversarial' macrorealist 
who makes the most extreme assumptions about the effects of 
invasive measurements must nevertheless expect certain con- 
straints. Quantum physics predicts that under certain condi- 
tions such constraints can still be violated. We show that al- 
though the primary system may be in a totally mixed state, 
the degree to which the ancilla is correctly initialised directly 
affects one's ability to violate the constraint. We implement 
our protocol experimentally using an ensemble of nucleus- 
electron spin pairs in phosphorus doped silicon. The results 
comprehensively rule out a large range of classical descrip- 
tions for this class of system, which although microscopic rep- 
resents an important step towards performing rigorous tests on 
more macroscopic systems. 

RESULTS 
Three core experiments 

Consider the primary system's two states of interest la- 
belled by f or by J. undergoing arbitrary dynamics governed 
by a process labelled U. If the system is probed at distinct 
times with a measurement which distinguishes one state from 
the other (Figure [lj), the degree to which the state of the sys- 
tem correlates with itself at the different times may be quan- 
tified. The two-time correlator Kjj — (Q(ti)Q(tj)) is the ex- 
pected value of the product of the measurement outcome of 
the observable Q at time t\ and at time tj. If<2e{+1,— 1} 
for f, I respectively, and since the correlator is an average, we 
have — 1 < Kjj < 1. Calculating this quantity is straightfor- 
ward: one simply measures at f,-, waits, and measures again at 
tj multiplying the results together to compute Q{ti)Q(tf). One 
then averages over many instances of the experiment either by 
repeating it many times, or by employing an array of many 
identical systems, as in a recent test of non-contextuality-^H 
Although in a spatial ensemble one has no access to individ- 
ual elements, because of the ancillary nature of the measuring 
qubit (each element of the ensemble is coupled to its own), the 
test may still be performed. 

Now consider a family of three experiments, each one be- 
ginning with a primary system in an identical initial state p v 
and evolving under identical conditions governing the dynam- 
ics of the state. In the first experiment measurements are made 
at t\ and ?2 to determine K\i- In the same way the second and 
third experiments are used to determine ^23 and ^13 (Figure 
[l]}). We then evaluate the 'Leggett-Garg Function^ 

/ = ^2+^23+^13 + 1. (1) 

Any macrorealist theory according to which the measure- 
ments Q are non-invasive must predict / > 0. This is true 
regardless of how the theory distributes probability arbitrar- 
ily amongst classical trajectories of the primary system (the 
assumption of 'Induction' is required, see Ref. 17 , Supplemen- 



tary Methods). In contrast, according to quantum physics, / 
is negative for suitably chosen time evolution operator U . 



Ideal negative result measurements 

Following LeggetPEHSil, we implement measurements of 
Q which, by exploiting MR, are 'extremely natural and plau- 
sible' 4 candidates for non invasiveness. Imagine a measur- 
ing device that is physically incapable of interacting with a 
system in state f, but that will (possibly invasively) detect a 
system in state J,. Suppose we apply this detector to our sys- 
tem and it does not 'click'. The macrorealist infers the sys- 
tem is in state f, and was in this state immediately prior to 
measurement - but this information is obtained without any 
interaction. Switching to a complementary measuring device 
that perceives only the f state allows one to obtain the full 
set of data non-invasively, as long as one always abandons all 
experiments where the detector clicks. 

One must acknowledge that it is impossible to ensure that 
the measurement apparatus does not couple to and disturb 
some other, hidden, degrees of freedom. One cannot exclude 
macrorealist theories involving interactions between hidden 
parts of the system and detector (which in our case would 
have to occur even during a null measurement event). This 
is a general point applying to any LG test: one can only ad- 
dress a subclass of macrorealist theories which hold that such 
irremediable hidden degrees of freedom either do not exist, or 
are not relevant. 

The use of two detector configurations means that the three 
experiments introduced previously are each further resolved 
into a pair of experiments, one for non-invasive measurement 
of "f, and one for I (Figure[lJ). We utilise either a CNOT gate 
(which will flip the state of the ancilla if the control, i.e. the 
primary system, is in D or use an anti-CNOT gate (which will 
flip the state of the ancilla qubit if the primary is in t; Fig- 
ure [TJ, in each case post selecting experimental runs where 
the gate was not triggered (Supplementary Methods). The 
second, final measurement in each experiment need not be 
implemented non-invasively, since the subsequent dynamics 
are irrelevant. Note that it is important that the physical im- 
plementation of the CNOT (and anti-CNOT) operation is such 
that the primary system receives no perturbation when it is in 
the state associated with a null result. 

Here we set U = cos |l + / sin \o x . As long as the ancilla 
is correctly initialised, the quantum prediction is Kjj = cos(8) 
independent of p s and hence 

/ = 2cos6 + cos26+l, (2) 

which takes the value / = —0.5 for = 2n/3, violating the 
inequality / > predicted under MR n NIM. Arguments con- 
straining the macrorealist to non-negative values for / also do 
not depend on the primary system's initial state. 
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FIG. 1 : Our full implementation of the LG test requires six sub- 
experiments. If the measurements are non-invasive, the outcome 
statistics of a, a single ideal experiment (where all measurements are 
made in each run) will match those of b, a set of three core exper- 
iments (where only two measurements are made in each run). The 
actual lab implementation for the second of the three core experi- 
ments is shown in panel c. Shown in colour are the corresponding 
pulses applied to our experimental coupled-spin^ system. The pri- 
mary system is driven with radio-frequency pulses (red areas), and 
the CNOT and anti-CNOT operations are each applied with a single 
selective microwave frequency pulse (blue areas). The other two core 
experiments are similarly resolved into a pair of complimentary sub- 
experiments. 



Corrupt ancillas 

For any protocol employing a measurement ancilla, its ini- 
tialisation is of fundamental importance. A macrorealist re- 
gards an imperfectly prepared primary-ancilla qubit pair as a 
statistical mixture of the four states |44) , |4-t) > , |tt) and 
similarly a quantum physicist describes the initial state as a 
density matrix diagonal in the | system) | ancilla) basis. Quan- 
tum mechanically an incorrectly initialised ancilla will give 
rise to an incorrect correlator sign. To the macrorealist it will 
give a false indication that the measurement had been noninva- 
sive, allowing a potentially corrupt element through the post- 
selection. We define the venality £ as the fraction of the en- 
semble for which the ancilla is incorrectly prepared. Quantum 
physics predicts that each Kjj generalises to (1 — QKy — ^Kij, 
leading to 



/-> (1-2£)(2cos8 + cos28) + 1. 



(3) 



We identify two macrorealist attitudes pertaining to the effect 
of an invasive measurement. A 'moderate' view is that any 
invasively perturbed systems act in a random way, and so av- 
erage to produce zero net correlation. Then Kij — > (1 — QKn 
and so with g = Kyi + K^ + ^13 and g > — 1 for a macroreal- 
ist, 




FIG. 2: The bounds on the LG inequality for quantum mechani- 
cal and macrorealist models depend on the venality in the experi- 
ment. Plots of the quantum mechanical prediction (white) and lower 
bound of a modified inequality for the a, moderate (blue) and b, ad- 
versarial (red) macrorealist attitudes as a function of the angle 9 and 
the venality Where the quantum prediction dips below the macro- 
realist bound it is in principle possible to invalidate the macrorealist 
stance. Note the critical value of £ = 0.25 and ^ = 0.1 above which 
one cannot exclude macrorealism for the moderate and adversarial 
approaches respectively. 



Note / is still constrained to be non-negative. An 'adversar- 
ial' view is that invasively perturbed elements will, by some 
unidentified process, act in such a manner as to minimise /. 
Consequently Kjj — > (1 — QiCy — ^ so that 
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adversarial 



= (1-Q*-3G+1>-2C 



(5) 



moderate 



= (1-Q*+1>£. 



(4) 



This is the most aggressive stance available to a macrorealist. 
The relevant thresholds are plotted in Figure [2] showing that 
minimising K, is crucial for a successful experiment. 



Experimental implementation 

To demonstrate an experimental violation of these inequal- 
ities, we consider an ensemble of phosphorus donors in sil- 
icon, consisting of electron-nuclear spin pairs. Here the nu- 
clear spin is the primary system, while the electron is the 
measurement ancilla. In the high field limit, the eigenstates 
of this spin i - spin i system are precisely the four product 
spin states. In thermal equilibrium, and ignoring the weak 
polarisation of the nucleus, these states are populated accord- 
ing to the Boltzmann distribution, where the spin states are in 
the ratio a : 1 for a = exp{-g/jB/k B T). Here B = 3.357 T 
is the magnetic field, g is the electron spin's g-factor, /j is the 
Bohr magneton, kg is Boltzmann's constant and T is the tem- 
perature. The electron and nuclear spin are coupled through 
a 1 17.5 MHz hyperfine interaction, which distinguishes each 
individual |t) : \l) transition. The electronic (nuclear) transi- 
tions can be individually addressed using selective microwave 
(radio-frequency) pulses. The unitary nuclear rotation U may 
be performed in a manner which is conditional on the system 
being in the 'correct' ancilla state I (as a refinement of the 
circuit illustrated in Figure [TJ;) because the postselected data 
will always correspond to the unitary operation U having been 
applied. The correlator sequences applied to this system are 
shown in Figure [3^. The final measurement at the end of an 
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FIG. 3: Experimental values for the LG function are compared 
with bounds from quantum mechanics and macrorealist theo- 
ries a, The populations of the four system-ancilla (nucleus-electron) 
states are manipulated with microwave and radio-frequency radia- 
tion. The experimentally determined value of the Leggett-Garg func- 
tion at a static field of B = 3.357 T is plotted b at 2.6 K for a thermal 
initial state and c at 2.7 K with a hyperpolarised initial state. The 
minimum bound for each macrorealist approach is also plotted: blue 
for moderate, red for adversarial. Error bars represent uncertainty in 
measurement of the final state, and the grey point and error bars are 
the result of correcting for known measurement errors (namely the 
population damping effects of the tomography pulse sequence). 



individual correlator sequence is accomplished through popu- 
lation tomog raphjP. 



Inequality violation 

We performed two experimental tests with results shown in 
Figure^ and|5J;. The first used a simple state in thermal equi- 
librium at 2.6 K with t, = 2a/(2 + 2a) = 0.150, yielding / = 
—0.031. The second used an established hyperpolarisation se- 
quence 20 from an initial state at 2.7 K. Due to the conditional 
nature of U this technique reduces the venality (please see 
Supplementary Methods) to £ — 2a 2 /(l +a + 2a 2 ) = 0.056, 
yielding / = —0.296. In the course of our experiments, the 
fidelity of the final state populations with respect to the ideal 
target was never less than 98.9%. Our analysis has made two 
assumptions about the measurement process: Firstly, that any 
detector imperfections do not conspire to favour anticorrela- 
tions preferentially. Secondly, as discussed earlier, that our 
null measurements do not influence the correlations through 
some hidden structure of the macrorealist's state. Our results 
then constitute a falsification of MR n NIM for cold nuclear 
spins. 



DISCUSSION 



for the implications through a quantity termed 'venality'. We 
show that for sufficiently low venality even an 'adversarial' 
macrorealist must concede that his view is inconsistent with 
experimental results. Importantly this approach allows one to 
employ either individually controlled systems or a spatial en- 
semble, and it is applicable to systems of any size. 

For our chosen experimental system, an ensemble of phos- 
phorous impurities in silicon, we were able to reach a low tem- 
perature, high field regime where the venality is low enough 
for our LG test to be feasible. Through the use of high preci- 
sion control techniques, we were indeed able to obtain a result 
representing an unequivocal violation of the inequality. The 
violation of this bound has secured the following profound 
conclusion: All accurate descriptions of systems of this type 
must include a concept similar to that of quantum superposi- 
tion, and/or an exotic notion of measurement similar to that of 
wavefunction collapse. 

While our experimental results relate to a microscopic sys- 
tem, we emphasise that our protocol is entirely general in 
terms of the scale of the system and whether it is individu- 
ally controlled. Thus we hope that our work will give rise to a 
series of experiments which probe successively more macro- 
scopic entities with the same rigour that we apply here. Ulti- 
mately such experiments will realise Leggett and Garg's vi- 
sion of establishing whether superpositions of macroscopi- 
cally distinct states are indeed possible. 



METHODS 

Weak measurements versus ideal negative result measurements 

LG tests employ the concept of non-invasive measure- 
ment in a fundamental way; the approaches one may take 
when seeking an implementation include weak measurement 
or ideal negative result measurement. Weak measurements 
are likely to be regarded by both the quantum physicist and 
the macrorealist as approximations to true non-invasiveness. 
Meanwhile Leggett's concept of negative result measurement 
will seem highly invasive to a quantum physicist but entirely 
non-invasive to a macrorealist. As we are interested in a test 
involving a gap between the predictions of quantum physics 
versus macrorealist theories, it is the latter approach that is 
preferable. The weak measurement approach cannot be al- 
tered to take account of the amount of invasiveness by defin- 
ing something like the venality (which is a measure of how 
often a non-ideal measurement is applied and not a measure 
of the invasiveness of a given measurement). A back action is 
imparted for each and every run of the experiment, and so the 
so called 'clumsiness loophole' 12 cannot be closed this way. 



Our approach relies upon the 'ideal negative result' mea- 
surements originally envisaged by LG; we show that such 
measurements are possible through an ancilla. Recognising 
that ancilla preparation will always be imperfect, we account 



Sample preparation 

Si:P consists of an electron spin S = 1/2 (g = 1.9987) cou- 
pled to the nuclear spin 7 = l/2of 31 P through an isotropic 
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hyperfine coupling of a — 4.19 ml The W-band EPR signal 
comprises of two lines (one for each nuclear spin projection 
Mi = ±1/2). Our experiments were performed on the low- 
field line of the EPR doublet corresponding to Mj = 1/2. At 
2.6 K and 3.36 T, the electron and nuclear spin T\ were mea- 
sured to be approximately 1 s and 100 s, respectively. 

The sample consists of a 28 Si-enriched single crystal about 
0.5 mm in diameter with a residual 29 Si concentration of or- 
der 70 ppm, produced by decomposing isotopically enriched 
silane in a recirculating reactor to produce poly-Si rods, fol- 
lowed by floating zone crystallisation. Phosphorus doping of 
~ 10 14 cm~ 3 was achieved by adding dilute PH3 gas to the 
Ar ambient during the final float zone single crystal growth. 
Further information on the sample growth has been reported 
elsewhere^. 

Pulsed EPR experiments were performed using a W-band 
(94 GHz) Bruker Elexsys 680 spectrometer equipped with a 
6T superconducting magnet and a low temperature helium- 
flow cryostat (Oxford CF935). The cryostat was pumped to 
achieve a temperature of 2.6 K (internal thermocouple). Typi- 
cal pulse times were 56 ns (288 ns) for a MW1 (MW2) % pulse 
and 90 fjs for an RF jc pulse. 

Spin resonance experiments 

Both the conditional nuclear operation, and also the non- 
invasiveness of the measurement operation performed by the 
ancilla electron spin, require that the magnetic resonance 
pulses are selective to a high degree. The electron and nuclear 
spin resonance frequencies are separated by ~ 10 and ~ 10 4 
times the pulse excitation bandwidth respectively, so we may 
rule out excitation of non-resonant spin transitions (please 
see Supplementary Methods). The spin-relaxation lifetimes at 
2.6 K are orders of magnitude longer than the total experiment 
time of 450 /js, and so we expect (and observe) no population 
shifts due to relaxation on these timescales. 

The Leggett-Garg function / is a linear combination of pop- 
ulations, which can be considered as diagonal entries in a 
density matrix. Using magnetic resonance, only population 
differences can be measured. This leads to an 'observable' 
(or 'pseudopure') component which can be manipulated by 
an experimentalist, and an 'unobservable' component, made 
up of populations common to all eigenstates. For each of 
the six sub-experiments, a four dimensional 'pseudopure' ma- 
trix was measured, which was then added to an appropriately 
scaled identity component determined by the local magnetic 
field and temperature of the sample (representing the unmea- 
surable component of the ensemble). A baseline measure- 
ment was taken as an average of 2000 samples, and all data 
sets were baseline-corrected before processing. The popula- 
tion differences were measured by an average of 200 samples 
and scaled with respect to a measured thermal amplitude (also 
taken as an average over 200 samples), and adjusted to have 
unit trace with the addition of an appropriately scaled identity 
matrix. 



Error analysis 

The errors corresponding to each population were calcu- 
lated according to the standard error of the direct difference 
measurements. These population errors were transformed into 
final Leggett-Garg function uncertainty by a Monte Carlo gen- 
eration of density matrices. The generated matrices deviated 
from the measured matrix in each element by an amount cho- 
sen randomly from a normal distribution whose standard de- 
viation matched that elements' error. Once re-normalised, un- 
physical matrices were discarded and statistics on physical 
matrices were collected. In total, 2 matrices were used to 
compile the final uncertainty. This constituted the 'raw' pseu- 
dopure matrix. 

The principal source of error in the population difference 
measurements came from microwave and radio-frequency in- 
homogeneity leading to a spread in applied rotation angles 
across the ensemble. These errors constituted a loss of sig- 
nal for every applied pulse, with a negligible net over- or 
under-rotation. We fit the Rabi oscillations of each of the two 
microwave-frequency rotations and the radio-frequency rota- 
tions to arrive at an estimate for the signal lost per applied K 
rotation in the population tomography sequence. These fits 
were used to estimate the populations without the amplitude- 
dampening effects of the tomography sequence, and the un- 
certainties of these fits were used to estimate the uncertainty 
of each population element. These uncertainties were com- 
bined with the measurement uncertainty error before perform- 
ing Monte-Carlo simulations as above with 2 12 matrices. This 
enables us to correct for the limitations of the tomography se- 
quence and infer the actual populations before the tomography 
is applied. 

The calculated pseudopure matrix p pp was added to the ap- 
propriate amount of identity matrix I as determined by the 
sample temperature. The explicit reconstruction is given by 

p f = [oc/(2(l+oc))]I+[(l -oc)/((l + a))]ppp. 

The diagonal entries of six matrices of this kind were used to 
generate each of the datapoints shown in Figure [3] The value 
for / calculated from raw populations is shown there in black 
and the value for / calculated from populations corrected to 
compensate for the principal tomography errors is shown in 
grey, for both the hyperpolarised and un-hyperpolarised data 
sets. 

There are two conventional measures of state fidelity, 
jF(pi,P2) = (Tr ( y^/piP 1 ^/P2) ) or alternatively the more 
generous measure \/T(pi , P2)- When applied to physically 
allowed states, both measures are non-negative and reach a 
maximum value of 1 when pi — P2- The fidelity used in the 
main text calculates f when comparing the gathered density 
matrix with the target density matrices. Examples of gathered 
versus ideal populations are shown in Figure [4] 
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FIG. 4: An example of the measured populations acquired from 
tomography. Orange bars represent diagonal matrix elements at the 
end of the second core experiment. The wireframes are the ideal 
quantum values. The populations were acquired from a, the CNOT 
circuit and b, anti-CNOT circuit. 
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Supplementary Methods 
Constraints on macrorealism 

Recall that each of the three core experiments are resolved into a further two sub-experiments, making six experiments in 
all. A macrorealist, under the assumption of non-invasive measurability, will concede that in the six experiments (which are 
performed on an identical initial state, and under the same conditions governing dynamics, i.e. the same Hamiltonian), the 
combined and post-selected results will be entirely equivalent to the family of three core experiments, each pair of circuits 
being equivalent to a single member of that family. Failure to post-select the results of measurements (as in e.g Ref!^ severely 
weakens the argument, and effectively introduces an extra assumption, namely that CNOT gates are always non-invasive. With 
proper post-selection then, the constraints (derived below) that are manifested in the Leggett-Garg inequality apply equally to the 
combined and post-selected results of the six lab experiments as they do to a single ideal experiment. This argument makes use 
of an additional assumption named 'Induction' . This is an assumption about the behaviour of identically prepared and identically 
treated ensembles, and essentially states that causality only runs forwards in time- . We take this assumption as self evident and 
so do not state it explicitly in the main paper. Furthermore we believe that this assumption is equally required by experiments 
utilising a spatial ensemble and those using a time ensemble. 

All macrorealist theories are required to predict measurement statistics for the correlators involved in the Leggett-Garg in- 
equality. The underlying theory of macrorealism, if it is to be consistent, must abide by the conservation of probability and other 
consistency conditions. For example consider a general macrorealist theory assigning probabilities P(^!i^2$3) to each possible 
evolution of the system: 
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LILp(tifct3) = i, 

Xl tl $3 

P(tl43)=P(tlt2| 3 )+P(tl42| 3 ). 

Using these conditions each correlator may be calculated from the macrorealist table by choosing the two appropriate rows for 
each two-time correlator (tracing out the column for whichever time is not needed), i.e. : 
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One then multiplies each pairwise sum of probabilities by ±1 according to whether that row was a correlation or anti- 
correlation. The lower bound for the Leggett-Garg inequality arises from the frustration of a given state being anti-correlated 
with at most one of the other states (but not both), and the fact that because no single evolution of the system can violate the 
inequality, no statistical sampling will. 

Each of the classical trajectories can be probed non-invasively, by post-selecting populations from the appropriate circuit. An 
experimenter extracts correlations in the following way, with populations labelled | system) jancilla): 



For consistency we have 



and for example 
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The stationarity assumption 

Although others have used it (sometimes implicitly), the additional assumption of stationarity is first given explicitly by 
Huelga et alP^l 

". . . the evolution from t\ to ?2 is governed by the same stochastic differential equation as the evolution from ti 
to ?3, and this implies stationarity; that is K(t\,t2) — K(t\ — t<ij". 

This assumption is often used to redefine the Leggett-Garg Inequality 

f = K(x)+K(2x)>-\ (S6) 

or similar. We note that there exist numerous macrorealist theories (which make predictions by distributing probability in the 
way outlined above) which are capable of violating (S|6]l. Consider a macrorealist theory which has as it's hidden variable, 
and flips from one of it's states to the other with a probability proportional to the cosine squared of this angle. Such a theory is 
clearly capable of predicting Rabi oscillations. We take it to be an important feature of the original Leggett-Garg inequality that 
it is not violated by such theories. 



Reducing the venality through hyperpolarisation 

The unitary nuclear rotation U may be performed in a manner which is conditional on the system being in the 'correct' ancilla 
state J, because the postselected data will always correspond to the unitary operation U having been applied. If the rotation is 
conditional in this way, one of the two 'bad' populations becomes inactive and will not experience any evolution whatsoever in 
the course of the protocol (specifically state |4_t) f° r the CNOT circuits and |ff ) for the anti-CNOT circuits). The inactive state 
does not participate in the experiment and may be ignored. By minimising the population of the single active bad population we 
can reach a reduced effective venality. If the population distribution of all four energy levels is the same for the initial state of 
both circuits in each pair we have e.g. in the {\H) , , |t4) > Itt)} basis 
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where pc and are initial states prepared for CNOT and anti-CNOT circuits respectively, and Z = a + b + c + d in both cases. The 
following expressions describe the lower bounds on quantum mechanical (QM), Moderate macrorealist (MMR) and Adversarial 
macrorealist (AMR) predictions: 
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where g = K\2 + A43 + K23 and / — g + 1 . The venality ^ = (c + d)/Z allows one to write 

8QM> (1-2Q(cos20 + 2cos0) 

gMMR > -(1 - 

fcuffl>-(l-£)-3£. 



(S7) 
(S8) 
(S9) 
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In thermal equilibrium (a,b,c,d) = (1,0C, 1,0c) and so in general ^ = 2a/(2 + 2a). When oscillations are only driven on those 
primary systems which were paired with a correctly initialised ancilla, one (system,ancilla) state always remains unused through- 
out the experiment. We exploit this fact by hyperpolarising the system so that the remaining active state has a lower population 
than is possible in thermal equilibrium at a given temperature. If the population distribution is identical across only the three 
active levels of the experiment we have 



Pc = 



for the CNOT circuit and 
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for the anti-CNOT circuit with Z = a + b + c + d as usual. The inactive state is denoted with [ ]. These different initial states, al- 
though physically distinct, are logically identical because the relevant active energy levels have the same population distribution. 
The predictions are now 
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Note that all predictions are independent of the inactive state with population c, except for in the normalisation Z. The nor- 
malisation can be arbitrarily scaled without affecting the comparison of the three predictions for g (or for /) since they will 
all be affected linearly in the same fashion. We choose to multiply g by Z/(a + b + 2d) so that there is a normalisation of 
a + b + 2d = Z r and no longer any dependence on c. This allows us to define the venality as £ = 2d/Z r and to recover equations 
(S|7j,(S(8j,(^9j. This technique is equivalent to supplying the single four level population distribution 
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to both types of circuit. Using hyperpolarisation we achieve (a,b,c,d) = (l,a,0C,0C 2 ) so that £ = 2a 2 /(l + a + 2a 2 



Effect of detuned pulses 

In the ideal scenario, the experimenter applies either the CNOT or anti-CNOT to the primary system-ancilla pair to perform the 
non-invasive measurement. In real spin resonance experiments each of the pulses will excite finite amplitude in the unwanted 
transition (i.e. it is not infinitely far off resonance). The post-selection procedure will remove any pairs from the ensemble which 
are affected by a microwave pulse, detuned or not; but of course this post-selection is ill-informed for those pairs in which the 
ancilla is incorrectly initialised. To allow for this one can simply expand the venality to include a fraction A of the inactive 
state population. Note that this A can be arbitrarily minimised in spin-resonance experiments by for example increasing the 
duration of the pulses which are applied, or using a sample with a larger splitting between the two microwave frequencies. In our 
experiment the A is less than 0.04 and we have confirmed that the corresponding correction to venality makes little difference to 
the degree of violation of our Leggett-Garg inequality. 

Note that it is also important that the physical implementation of the CNOT (and anti-CNOT) operations is such that the primary 
system receives no perturbation when it is in state I; it would not be acceptable to implement the CNOT as a series of low level 
operations, some of which perturb the primary system: even if their net effect is that of the CNOT (as is the case for example 
with a controlled phase gate plus single qubit rotations). 
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